Introduction

Figure caption: Main causes of deforestation in central Menabe. a-a’: Slash-and-burn agriculture (“hatsake”) for peanut crop. Peanut (a’) is cultivated as a cash crop. Part of the production is at the destination of the national market but most of it is exported outside Madagascar, mainly for the Chinese market. b-b’: Slash-and-burn agriculture for maize crop. maize (b’) is cultivated for auto-consumption and as a cash crop. The production of maize is at the destination of the national market and is used in particular to brew the national beers. c-c’: Cyclone followed by uncontrolled fires. Cyclone “Fanele” (2009) caused tree mortality and accumulation of wood fuel on the ground. As a consequence, uncontrolled fires set on nearby pastures (c’) spread over large areas of forest after 2009. d-d’: Illegal logging. Timbers are used for house and boat construction.

This tutorial is licensed under the Creative Commons Attribution-ShareAlike 4.0 International License.

Computing environment

Installing Docker

Docker is used to run software packages called “containers”. Containers are isolated from each other and bundle their own tools, libraries and configuration files. Containers are created from “images” that specify their precise contents.

To install Docker for your operating system, go to https://docs.docker.com/install/. For Windows user, go directly to https://docs.docker.com/docker-for-windows/install/.

Docker image

For our tutorial, we have built a specific Docker image including R and Python:

  • R computing environment: R 3.5.1, RStudio, and various R packages (ex. knitr, ggplot2, reticulate).
  • Python computing environment: Python 3, and various Python packages (ex. numpy, pandas, matplotlib, and forestatrisk).

Some computations, such as computation by blocks on large raster files, were more easily coded and faster with Python. That’s why we decided to use Python as a complementary language to R. The reticulate R package allow us to use Python functions from R and thus to work in a single computing environment within RStudio.

Load the forestatrisk Docker image

Download the image from internet with the following command in a terminal: docker pull ghislainv/docker-forestatrisk . (do not forget the final “.”)

Note: If the internet connection is slow, use a USB key and copy the image somewhere on your computer and load it with the following command: docker load -i <path to image tar file>

Run the container with docker run -d -e PASSWORD=PWD -e ROOT=YES -p 8787:8787 -v ORIGIN_FOLDER:/home/rstudio ghislainv/docker-forestatrisk.

Note:

  • Set your own password PWD, it cannot be rstudio.
  • Replace ORIGIN_FOLDER with a folder on your local machine where you want to save the R scripts. For example /c/Users/YOUR_NAME or /home/YOUR_NAME.

Point your browser to localhost:8787 to access RStudio interface. Log in with rstudio/PWD.

Get the tutorial

In RStudio, open a terminal and clone the GitHub repository locally:
git clone https://github.com/ghislainv/forecasting-deforestation-Mada

Navigate with RStudio in the folder named forecasting-deforestation-Mada and open the R Notebook file named training.Rmd. R Markdown files (*.Rmd) can be used in R to embbed text, code, table and figues inside a unique document. Code are organized into ‘chunks’ that can be executed independently and interactively. An R Notebook is an R Markdown document with output visible immediately beneath the input.

I used this notebook to write the tutorial available at https://ecology.ghislainv.fr/forecasting-deforestation-Mada.

In the preambule, change the author and date with your name and today’s date.

Importing Python modules in R

Specify the Python version to use and check that it is the right one.

use_python("/usr/bin/python3.5")
py_config()
python:         /usr/bin/python3.5
libpython:      /usr/lib/python3.5/config-3.5m-x86_64-linux-gnu/libpython3.5.so
pythonhome:     /usr:/usr
version:        3.5.3 (default, Sep 27 2018, 17:25:39)  [GCC 6.3.0 20170516]
numpy:          /usr/local/lib/python3.5/dist-packages/numpy
numpy_version:  1.15.3

python versions found: 
 /usr/bin/python3.5
 /usr/bin/python
 /usr/bin/python3

Import the Python modules into R.

far <- import("forestatrisk")
patsy <- import("patsy")
sm <- import("statsmodels.api")
smf <- import("statsmodels.formula.api")

R/Python intro

R

Some exercises
a = c(20,5,6,2,9,12)
b = 1
c = a + b
print(c)
[1] 21  6  7  3 10 13
Simple regression
# Generating data
nobs <- 100
x.seq <- runif(n=nobs,min=0,max=100)
a.target <- 2
b.target <- 2
sigma2.target <- 300
y.seq <- a.target + b.target*x.seq +
         rnorm(n=nobs,mean=0,sd=sqrt(sigma2.target))
# Data-set
Data <- as.data.frame(cbind(y.seq,x.seq))
# Plot
plot(x.seq,y.seq)

# Estimation
M <- lm(y.seq~x.seq,data=Data)
summary(M)

Call:
lm(formula = y.seq ~ x.seq, data = Data)

Residuals:
    Min      1Q  Median      3Q     Max 
-43.052 -13.131  -1.399  12.603  48.548 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  6.93329    3.78986   1.829   0.0704 .  
x.seq        1.96978    0.06754  29.167   <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 18.91 on 98 degrees of freedom
Multiple R-squared:  0.8967,    Adjusted R-squared:  0.8956 
F-statistic: 850.7 on 1 and 98 DF,  p-value: < 2.2e-16
M$coefficients
(Intercept)       x.seq 
   6.933290    1.969783 
var(M$residuals)
[1] 354.0755
M$df
[1] 98
# Graphics 
x.pred <- seq(from=0,to=100,length.out=100) # We will predict y for 100 values of x
X.pred <- as.data.frame(x.pred)
names(X.pred) <- "x.seq"
Y.pred <- predict.lm(M,X.pred,interval="confidence",
                     type="response")
plot(x.seq,y.seq,col=grey(0.7),
     xlab="X",
     ylab="Y",
     main="Simple linear regression") # Row data
# Confidence envelops for predictive posterior
lines(x.pred,Y.pred[,1],col="red",lwd=3)
lines(x.pred,Y.pred[,2],col="blue",lwd=2,lty=2)
lines(x.pred,Y.pred[,3],col="blue",lwd=2,lty=2)

Python

Some exercises

You can execute Python code within a R notebooks.

import numpy as np
a = np.array([20,5,6,2,9,12])
b = 1
c = a + b
print(type(c))
print(c)
<class 'numpy.ndarray'>
[21  6  7  3 10 13]

Preparing the data-set

Downloading the geospatial data

To model the spatial probability of deforestation, we need a map of the past deforestation and maps of potential explicative environmental variables. Environmental variables can be derived from topography (altitude and slope), accessibility (distance to roads, towns, and forest edge), deforestation history (distance to previous deforestation) or landscape policy (protected area network) for example. In our example, we use the following variables :

library(kableExtra)
tab1 <- read.table("training/tables/variables.txt",sep=",",header=TRUE)
names(tab1) <- c("Product","Source","Variable derived","Unit","Resolution (m)")
# Kable
knitr::kable(tab1, linesep="") %>%
  kable_styling(bootstrap_options = c("striped"))
Product Source Variable derived Unit Resolution (m)
Deforestation maps (1990-2000-2010) BioSceneMada (1) forest/non-forest -- 30
distance to forest edge m 30
distance to previous deforestation m 30
Digital Elevation Model SRTM v4.1 CSI-CGIAR (2) altitude m 90
slope ° 90
Highways OSM - Geofabrik (3) distance to roads m 150
Places distance to towns m 150
Waterways distance to river m 150
Protected areas Rebioma (4) presence of protected area -- 30
  1. http://bioscenemada.cirad.fr, deforestation maps from Vieilledent et al. (2018)
  2. http://srtm.csi.cgiar.org, SRTM 90m Digital Elevation Database v4.1
  3. http://www.geofabrik.de, data extracts from the OpenStreetMap project for Madagascar,
  4. http://rebioma.net, SAPM (“Système des Aires Protégées à Madagascar”), 20/12/2010 version
# Make data directory
if (!dir.exists("data")) {
  dir.create("data")
  dir.create("data/model")
  dir.create("data/mada")
  dir.create("data/validation")
}
# Files
files <- c("fordefor2010.tif", "dist_edge.tif", "dist_defor.tif",
           "altitude.tif", "slope.tif", "aspect.tif",
           "dist_road.tif", "dist_town.tif", "dist_river.tif",
           "sapm.tif", "roads.kml", "towns.kml", "rivers.kml", "sapm.kml")
# Download model data
for (i in files) {
  if (!file.exists(glue("data/model/{i}"))) {
    f <- glue("https://zenodo.org/record/259582/files/{i}")
    curl_download(f, glue("data/model/{i}"), quiet=FALSE)
  }
}
# Download validation data
if (!file.exists("data/validation/for2017.tif")) {
  f <- "http://bioscenemada.cirad.fr/FileTransfer/for2017.tif"
  curl_download(f, "data/validation/for2017.tif", quiet=FALSE)
}
# Download mada outline
if (!file.exists("data/mada/mada38s.shp")) {
  f <- "http://bioscenemada.cirad.fr/FileTransfer/mada38s.zip"
  curl_download(f, "data/mada/mada38s.zip", quiet=FALSE)
  unzip("data/mada/mada38s.zip", exdir="data/mada")
}

In our example, fordefor2010.tif is a forest raster at 30m for the year 2010 considering the deforestation on the period 2000-2010 in Madagascar. We can plot this raster and zoom on a region with function .plot.forest() in the forestatrisk package to see the deforestation data. The remaining forest appears in green and the deforested areas appear in red.

# Plot forest cover change 2000-2010
fig <- far$plot$fcc(input_fcc_raster="data/model/fordefor2010.tif",
             output_file="output/fcc.png",
             col=c(255,0,0,255),  # rgba color for deforestation
             figsize=c(5,5),
             dpi=150,
             zoom=c(340000,412000,7420000,7500000))
knitr::include_graphics("output/fcc.png")

Sampling points

We use the function .sample() from the forestatrisk module to sample 10,000 points (pixel centers) in the deforested areas and in the remaining forest (20,000 points in total). The input_forest_raster argument defines the path to the forest raster including the deforested pixels (with value=0) and the remaining forest pixels (value=1) after a given period of time. The random seed can be set with argument Seed to reproduce the data of the random sampling.

The .sample() function also extracts information from environmental variables for each sampled point. Sampling is done by block to allow the computation on large study areas (e.g. country or continental scale) with a fine spatial resolution (e.g. 30m). The var_dir argument indicates the directory including all the environmental raster files (they must be GeoTIFF files with extension .tif) that we want to test.

The .sample() function identifies the spatial cell for each sample point (sample point are grouped within a spatial cell). Spatial cells and grouped observations are used to estimate the spatial autocorrelation of the deforestation process. The csize argument define the width (in km) of the square spatial cells. Each spatial cell will be attributed a parameter. To avoid estimating too many parameters, width of the square spatial cells must be sufficiently large. Both points sampling and extraction of environmental data are done by block to avoid memory problems for big datasets.

# Make output directory
if (!dir.exists("output")) {
  dir.create("output")
}
# Training data-set
if (!file.exists("output/sample.txt")) {
  samp <- far$sample(nsamp=10000L, Seed=1234L, csize=10L,
                     var_dir="data/model",
                     input_forest_raster="fordefor2010.tif",
                     output_file="output/sample.txt",
                     blk_rows=1L)
}
samp <- read.table("output/sample.txt", header=TRUE, sep=",")
set.seed(1234)
train <- sample(1:20000, size=10000, replace=FALSE)
data_train <- samp[train,] %>% dplyr::filter(complete.cases(.))
data_valid <- samp[-train,] %>% dplyr::filter(complete.cases(.))
head(data_train)

Sampled observations can be plotted using function .plot.obs() from the deforestprob module. Dark red dots indicate deforestation observations and dark green dots indicate forest observations.

# Plot sample points
fig <- far$plot$obs(sample=data_train,
             name_forest_var="fordefor2010",
             input_fcc_raster="data/model/fordefor2010.tif",
             output_file="output/obs.png",
             zoom=c(340000,412000,7420000,7500000),
             figsize=c(5,5),#c(11.69,8.27),
             s=5,dpi=300)
knitr::include_graphics("output/obs.png")

Descriptive statistics

Before modelling the deforestation, it is important to look at the relationship between environmental variables and deforestation. Using formulas from the patsy Python module, we can specify the relationships that we want to look at. In the example below, we plot the relationships between some continuous environmental variables and the probability of deforestation using function .plot.correlation() from the forestatrisk package. Note that -1 must be set at the end of the formula. The function .correlation() returns a serie of graphics that can be analyzed to choose the right relationship for each continuous variable (linear or polynomial for example).

# Descriptive statistics
# Model formulas
formula_1 <- paste0("fordefor2010 ~ dist_road + dist_town + dist_defor +",
                    "dist_river + dist_edge + altitude + slope + aspect - 1")
# Standardized variables (mean=0, std=1)
formula_2 <- paste0("fordefor2010 ~ scale(dist_road) + scale(dist_town) +",
                    "scale(dist_defor) + scale(dist_river) + scale(dist_edge) +",
                    "scale(altitude) + scale(slope) + scale(aspect) - 1")
formulas <- c(formula_1, formula_2)
# List to store figures
corr_fig <- list()
# Loop on formulas
for (f in 1:length(formulas)) {
    # Output file
    of <- glue("output/correlation_{f}.pdf")
    # Data
    dmat <- patsy$dmatrices(formulas[f], data=data_train, eval_env=-1L,
                                  return_type="dataframe")
    # Plots
    fig <- far$plot$correlation(y=dmat[[1]],data=dmat[[2]],
                         plots_per_page=3L,figsize=c(7,8),
                         dpi=100L,output_file=of)
}

In this example (see pdf files produced), we can see that a linear model should be sufficient to represent the relationship between the probability of deforestation and the standardized distance to the nearest road or town. On the contrary, it could be interesting to fit a polynomial model for the the standardized distance to previous deforestation (dist_defor variable) for which the relationship seems non-linear. Several models can be fitted and compared to see if a second-order or third-order polynomial relationship is justified.

Deforestation model

We propose to use the Binomial iCAR model (Vieilledent et al. 2014) to estimate the deforestation probability of a pixel given a set of environmental variables. The Binomial iCAR model is a linear Binomial logistic regression model including an intrinsic Conditional Autoregressive (iCAR) process to account for the spatial autocorrelation of the observations. Parameter inference is done in a hierarchical Bayesian framework. The .model_binomial_iCAR() function from the forestatrisk module includes a Metropolis-within-Gibbs algorithm written in pure C code to reduce computation time.

Figure caption: Parameter inference is done in a hierarchical Bayesian framework. Bayesian statistics rely on the Bayes’ theorem named after Reverend Thomas Bayes. Each parameter has a prior and an approximated posterior probability distribution from which we can compute the mean, standard deviation, credible intervals at 95%, etc.

For the deforestation process it is very important to take into account the spatial autocorrelation of the process with spatial random effects. Indeed, the selected fixed environmental variables are not able to fully explain the spatial variability of the deforestation process, especially when working at large geographical scales, such as the national or continental scale. Spatial random effects allow estimating a higher/lower probability of deforestation in a particular region (associated to unmeasurable or unknow factors) that is different from the mean probability of deforestation derived from the environmental factors included in the model. The Binomial iCAR model can be described as follow:

Ecological process

\[\begin{equation} y_i \sim \mathcal{B}inomial(\theta_i,t_i) \\ \text{logit}(\theta_i) = X_i \beta + \rho_{j(i)} \end{equation}\]

\(y_i\): random variable for the deforestation process (0 if no deforestation, 1 if deforestation)

\(\theta_i\): probability of deforestation

\(t_i\): number of trials (always 1 in our example)

\(X_i\): vector of values for environmental explicative variables

\(\beta\): vector of fixed effect parameters

\(\rho_j\): spatial random effect

\(j(i)\): index of the spatial entity for observation \(i\).

Spatial autocorrelation

The spatial autocorrelation is modelled with an intrinsic conditional autoregressive (iCAR) process:

\[\begin{equation} \rho_j \sim \mathcal{N}ormal(\mu_j,V_{\rho} / n_j) \end{equation}\]

\(\mu_j\): mean of \(\rho_{j'}\) in the neighborhood of \(j\).

\(V_{\rho}\): variance of the spatial random effects.

\(n_j\): number of neighbors for spatial entity \(j\).

Figure caption: Representation of the neighborhood for the intrinsic conditional autoregressive (iCAR) process. Target spatial cell \(j\) has 8 neighbors in this case. Several observations (black points, equivalent to pixel centers in our case) can be located in each spatial cell. Deforestation probability in one spatial cell \(j\) depends on deforestation probability in neighboring cells.

Spatial cells

Before running the model, we add a column indicating the number of trials for each observation (1 in our case as we are considering a Bernoulli process). We then remove any observation with non-available data (NA) from the data-set. We also compute the number of neighbors (nneigh) and the neighbor identifiers (adj) for each spatial cell using function .cellneigh from the forestatrisk module.

# Spatial cells for spatial-autocorrelation
neighborhood <- far$cellneigh_ctry(raster="data/model/fordefor2010.tif",
                                   vector="data/mada/mada38s.shp",
                                   csize=10L, rank=1L)
nneigh <- neighborhood[[1]]
adj <- neighborhood[[2]]
cell_in <- neighborhood[[3]]
ncell <- neighborhood[[4]]
# Udpate cell number in dataset
cell_rank <- vector()
for (i in 1:nrow(data_train)) {
  cell_rank[i] <- which(cell_in==data_train$cell[i])-1 # ! cells start at zero
}
data_train$cell <- cell_rank

Model formula

A model formula must also be defined to specify the explicative variables we want to include in the model. The formula allows specifying some variable transformations (such as standardization in our case). See the patsy module for more information. In our model, we included the following variables: location inside a protected area, altitude, distance to past deforestation (with a degree two polynomial), distance to forest edge, distance to nearest road and distance to nearest town. The formula must end with the name of the variable indicating the spatial cell for each observation (cell in our case).

# Formula
data_train$trials <- 1  # Set number of trials to one
formula <- paste0("I(1-fordefor2010) + trials ~ C(sapm) + scale(altitude) + scale(slope) +",
                  "scale(dist_defor) + np.power(scale(dist_defor),2) + ",
                  "scale(dist_edge) + ",
                  "scale(dist_road) + scale(dist_town) + cell")

Fitting model parameters

# Model
mod_binomial_iCAR <- far$model_binomial_iCAR(
  # Observations
  suitability_formula=formula, data=data_train,
  # Spatial structure
  n_neighbors=np_array(nneigh,dtype="int32"), neighbors=np_array(adj,dtype="int32"),
  # Environment
  eval_env=-1L,
  # Chains
  burnin=1000L, mcmc=1000L, thin=1L,
  # Starting values
  beta_start=-99)

Running the Gibbs sampler. It may be long, please keep cool :)

**********:10.0%, mean accept. rates= beta:0.061, rho:0.525
**********:20.0%, mean accept. rates= beta:0.153, rho:0.408
**********:30.0%, mean accept. rates= beta:0.199, rho:0.312
**********:40.0%, mean accept. rates= beta:0.241, rho:0.288
**********:50.0%, mean accept. rates= beta:0.233, rho:0.258
**********:60.0%, mean accept. rates= beta:0.269, rho:0.257
**********:70.0%, mean accept. rates= beta:0.280, rho:0.258
**********:80.0%, mean accept. rates= beta:0.257, rho:0.261
**********:90.0%, mean accept. rates= beta:0.286, rho:0.252
**********:100.0%, mean accept. rates= beta:0.283, rho:0.247

Model summary

Once the model has been fitted, we can print a summary of the model showing the parameter estimates. The 95% credible intervals obtained from the posterior distribution of each parameter, except distance to nearest town (dist_town), do not include zero, indicating that parameters are significantly different from zero. The variance of the spatial random effects (Vrho) is given together with the deviance value, which can be used to compare different statistical models (lower deviance is better). Looking at the parameter estimates, we can see that the deforestation probability is much lower inside protected areas and that deforestation probability decreases with altitude, slope, distance to past deforestation, forest edge, roads and towns. Parameter values are then coherent regarding the deforestation process and easy to interpret.

sink(file="output/summary_mod_binomial_iCAR.txt")
print(mod_binomial_iCAR)
sink()
print(mod_binomial_iCAR)
Binomial logistic regression with iCAR process
  Model: I(1 - fordefor2010) + trials ~ 1 + C(sapm) + scale(altitude) + scale(slope) + scale(dist_defor) + np.power(scale(dist_defor), 2) + scale(dist_edge) + scale(dist_road) + scale(dist_town) + cell
  Posteriors:
                                     Mean        Std     CI_low    CI_high
                     Intercept     -0.264     0.0833      -0.44    -0.0874
                C(sapm)[T.1.0]     -0.541        0.1     -0.742     -0.369
               scale(altitude)     -0.745     0.0663     -0.892      -0.61
                  scale(slope)     -0.161     0.0442     -0.255    -0.0811
             scale(dist_defor)     -0.863     0.0576     -0.974     -0.759
np.power(scale(dist_defor), 2)     0.0666    0.00813     0.0501     0.0817
              scale(dist_edge)     -0.552     0.0577     -0.662     -0.452
              scale(dist_road)      -0.19     0.0499     -0.285     -0.103
              scale(dist_town)     0.0327     0.0481    -0.0549      0.126
                          Vrho       8.62      0.591       7.54       9.68
                      Deviance   9.51e+03       60.9    9.4e+03   9.63e+03

Plot traces and posteriors

To check for the convergence of the Markov chain Monte Carlo (MCMC), we can plot the traces and the posterior distributions of the estimated parameters using method .plot() associated to the hSDM_binomial_iCAR class defined in the deforestprob module. This method returns the figures showing the traces and posterior distributions.

require(coda)
mcmc <- as.mcmc(mod_binomial_iCAR$mcmc)
plot(mcmc[,c(1:3,10,11)])

Forecasting deforestation

Resampling the spatial random effects

We use the model to predict the spatial probability of deforestation at the national scale for Madagascar. Before, doing so, we smooth the spatial random effects which have been estimated at a coarse resolution (10km in our example). To do so, we use the function .resample_rho() from the deforestprob module to resample the results at a finer resolution using a bilinear interpolation. The function writes a raster file to the disk with a resolution of the raster specified in the argument input_raster of the function (1km in our case). The function .resample_rho() returns a figure of the spatial random effects that can be plotted.

# Get the spatial random effects
rho <- rep(-9999,ncell)  # -9999 will be considered as nodata
rho[cell_in+1] <- mod_binomial_iCAR$rho
# Resample them
fig <- far$resample_rho(rho=r_to_py(rho), input_raster="data/model/fordefor2010.tif",
                 output_file="output/rho.tif",
                 csize_orig=10L, csize_new=1L)
# Plot random effects
fig <- far$plot$rho("output/rho_orig.tif",output_file="output/rho_orig.png")
fig <- far$plot$rho("output/rho.tif",output_file="output/rho.png")
# Plot with R
mada <- rgdal::readOGR(dsn="data/mada",layer="mada38s", verbose=FALSE)
r.rho_orig <- raster("output/rho_orig.tif")
r.rho <- raster("output/rho.tif")
rho_plot(r.rho_orig, mada, output_file="output/rho_orig_ggplot.png",
         quantiles_legend=c(0.025,0.975),width=4.5, height=8)
Results plotted to "output/rho_orig_ggplot.png"

rho_plot(r.rho, mada, output_file="output/rho_ggplot.png",
         quantiles_legend=c(0.025,0.975),width=4.5, height=8)
Results plotted to "output/rho_ggplot.png"

Computing spatial probability of deforestation

The .predict_raster_binomial_iCAR() function of the forestatrisk module can be used to predict the spatial probability of deforestation from an hSDM_binomial_iCAR model (i.e. an object of class hSDM_binomial_iCAR). The function writes a raster of predictions to the disk. The prediction is done by block to avoid memory problems for big datasets. Functions will return NA for pixels with no forest or for pixels with missing environmental variables.

far$predict_raster_binomial_iCAR(mod_binomial_iCAR, var_dir="data/model",
                                 input_cell_raster="output/rho.tif",
                                 input_forest_raster="data/model/fordefor2010.tif",
                                 output_file="output/pred_binomial_iCAR.tif",
                                 blk_rows=128L)

The raster of predictions can then be plotted.

fig <- far$plot$prob("output/pred_binomial_iCAR.tif",
                     output_file="output/pred_binomial_iCAR.png",
                     figsize=c(4,4))
knitr::include_graphics("output/pred_binomial_iCAR.png")

Predicting future forest cover

Given the spatial probability of deforestation and the number of hectares that should be deforested, we can predict the future forest cover using function .deforest() from the forestatrisk package. The number of hectares are converted into number of pixels to be deforested. Pixels with the highest probability of deforestation are deforested first. The function computes a probability threshold above which pixels are deforested.

In our example, we consider an annual deforestation of roughly 100,000 ha for Madagascar. Considering the period 2010-2050, this would correspond to 4 Mha of deforestation. This number can of course be more precise and refined considering various deforestation scenarios (demographic growth, economic development, etc.).

deforest <- far$deforest(input_raster="output/pred_binomial_iCAR.tif",
                         hectares=4000000,
                         output_file="output/forest_cover_2050.tif",
                         blk_rows=128L)

We can plot the predicted future forest cover in 2050 with leaflet. We first reproject the raster to WGS 84 / World Mercator projection (EPSG:3857) and we resample the map at 250 m using function gdalwarp from GDAL to obtain a lower resolution raster that can be transformed into a lower size image that can be plotted with leaflet.

gdalwarp -overwrite -s_srs EPSG:32738 -t_srs EPSG:3857 -tr 250 250 \
         -ot Byte -r near -co "COMPRESS=LZW" -co "PREDICTOR=2" -co "BIGTIFF=YES" \
         output/forest_cover_2050.tif output/forest_cover_2050_epsg3857_ov32.tif
Creating output file that is 3556P x 6534L.
Processing input file output/forest_cover_2050.tif.
Using internal nodata values (e.g. 255) for image output/forest_cover_2050.tif.
Copying nodata values from source output/forest_cover_2050.tif to destination output/forest_cover_2050_epsg3857_ov32.tif.
0...10...20...30...40...50...60...70...80...90...100 - done.

We also set the color of the map and the extent of the view for leaflet.

# Colors and extent view for leaflet
r <- raster("output/forest_cover_2050_epsg3857_ov32.tif")
pal <- colorFactor(c("red", "darkgreen"), c(0,1), na.color = "transparent")
r1 <- raster(nrows=1,ncols=1,ext=extent(340000,412000,7420000,7500000),crs=CRS("+init=epsg:32738"))
r2 <- projectExtent(r1, crs=CRS("+init=epsg:4326"))
ext2 <- extent(r2)

The red areas represent the deforestation on the period 2010-2050. The green areas represent the remaining forest in 2050. Most of the remaining forest in 2050 are inside the protected areas or located in remote areas, at high altitudes and far from roads and big cities (for example in the Tsaratanana mountain region and around the Masoala peninsula, north-east Madagascar).

# Leaflet map
m <- leaflet() %>% addTiles() %>%
  addRasterImage(r, colors=pal, opacity=0.8, project=FALSE) %>%
  fitBounds(ext2@xmin,ext2@ymin,ext2@xmax,ext2@ymax)
m

References

Vieilledent, Ghislain, Clovis Grinand, Fety A. Rakotomalala, Rija Ranaivosoa, Jean-Roger Rakotoarijaona, Thomas F. Allnutt, and Frédéric Achard. 2018. “Combining Global Tree Cover Loss Data with Historical National Forest Cover Maps to Look at Six Decades of Deforestation and Forest Fragmentation in Madagascar.” Biological Conservation 222:189–97. https://doi.org/10.1016/j.biocon.2018.04.008.

Vieilledent, Ghislain, Cory Merow, Jérôme Guélat, Andrew M. Latimer, Marc Kéry, Alan E. Gelfand, Adam M. Wilson, Frédéric Mortier, and John A. Silander Jr. 2014. hSDM: hierarchical Bayesian species distribution models. https://doi.org/10.5281/zenodo.594920.

---
title: "Modelling and forecasting deforestation"
author: "Ghislain Vieilledent"
date: "`r format(Sys.time(), '%d %B, %Y')`"
output:
  html_notebook:
    highlight: tango
    number_sections: no
    theme: spacelab
    toc: yes
    toc_float: yes
bibliography: bib/biblio.bib
biblio-style: bib/journal-of-applied-ecology.csl
---

```{r setup, include=FALSE}
library(knitr)
knitr::opts_chunk$set(
	echo=TRUE,
	message=FALSE,
	warning=FALSE,
	cache=FALSE
)
knitr::opts_knit$set(
  root.dir=rprojroot::find_rstudio_root_file()
)
```

## Introduction

```{r causes, echo=FALSE}
knitr::include_graphics("training/figures/causes.jpg")
```

Figure caption: **Main causes of deforestation in central Menabe.** **a-a'**: _Slash-and-burn agriculture ("hatsake") for peanut crop._ Peanut (a') is cultivated as a cash crop. Part of the production is at the destination of the national market but most of it is exported outside Madagascar, mainly for the Chinese market. **b-b'**: _Slash-and-burn agriculture for maize crop._ maize (b') is cultivated for auto-consumption and as a cash crop. The production of maize is at the destination of the national market and is used in particular to brew the national beers. **c-c'**: _Cyclone followed by uncontrolled fires._ Cyclone _"Fanele"_ (2009) caused tree mortality and accumulation of wood fuel on the ground. As a consequence, uncontrolled fires set on nearby pastures (c') spread over large areas of forest after 2009. **d-d'**: _Illegal logging._ Timbers are used for house and boat construction.

```{r cc-license, echo=FALSE}
knitr::include_graphics("training/figures/by-sa.png")
```

This tutorial is licensed under the [Creative Commons Attribution-ShareAlike 4.0 International License](http://creativecommons.org/licenses/by-sa/4.0/).

## Computing environment

### Installing Docker

Docker is used to run software packages called "containers". Containers are isolated from each other and bundle their own tools, libraries and configuration files. Containers are created from "images" that specify their precise contents.

To install Docker for your operating system, go to <https://docs.docker.com/install/>. For Windows user, go directly to <https://docs.docker.com/docker-for-windows/install/>.

### Docker image

For our tutorial, we have built a specific Docker image including R and Python:

- R computing environment: R 3.5.1, RStudio, and various R packages (ex. knitr, ggplot2, reticulate).
- Python computing environment: Python 3, and various Python packages (ex. numpy, pandas, matplotlib, and forestatrisk).

Some computations, such as computation by blocks on large raster files, were more easily coded and faster with Python. That's why we decided to use Python as a complementary language to R. The `reticulate` R package allow us to use Python functions from R and thus to work in a single computing environment within RStudio.

### Load the `forestatrisk` Docker image

Download the image from internet with the following command in a terminal: `docker pull ghislainv/docker-forestatrisk .` (do not forget the final ".")

Note: If the internet connection is slow, use a USB key and copy the image somewhere on your computer and load it with the following command: `docker load -i <path to image tar file>`

Run the container with `docker run -d -e PASSWORD=PWD -e ROOT=YES -p 8787:8787 -v ORIGIN_FOLDER:/home/rstudio ghislainv/docker-forestatrisk`.  

Note:

- Set your own password `PWD`, it cannot be `rstudio`.
- Replace `ORIGIN_FOLDER` with a folder on your local machine where you want to save the R scripts. For example `/c/Users/YOUR_NAME` or `/home/YOUR_NAME`.

Point your browser to `localhost:8787` to access RStudio interface. Log in with `rstudio/PWD`.

### Get the tutorial

In RStudio, open a terminal and clone the GitHub repository locally:  
`git clone https://github.com/ghislainv/forecasting-deforestation-Mada`

Navigate with RStudio in the folder named `forecasting-deforestation-Mada` and open the R Notebook file named `training.Rmd`. R Markdown files (`*.Rmd`) can be used in R to embbed text, code, table and figues inside a unique document. Code are organized into 'chunks' that can be executed independently and interactively. An R Notebook is an R Markdown document with output visible immediately beneath the input. 

I used this notebook to write the tutorial available at <https://ecology.ghislainv.fr/forecasting-deforestation-Mada>.

In the preambule, change the author and date with your name and today's date.

### Loading libraries in R

```{r R_libraries}
# Environmental variables
Sys.unsetenv("DISPLAY") # Remove DISPLAY for Python plot

# Libraries
require(reticulate)
require(glue)
require(curl)
require(dplyr)
require(broom)
require(ggplot2)
require(raster)
require(rasterVis)
require(rgdal)
require(leaflet)
```

```{r source_R}
# Source R plotting functions
source("R/dfp_plot.R")
```

### Importing Python modules in R

Specify the Python version to use and check that it is the right one.

```{r}
use_python("/usr/bin/python3.5")
py_config()
```

Import the Python modules into R.

```{r importpy}
far <- import("forestatrisk")
patsy <- import("patsy")
sm <- import("statsmodels.api")
smf <- import("statsmodels.formula.api")

```

### R/Python intro

#### R

##### Some links

- <https://www.rstudio.com/online-learning/#r-programming>
- <https://cran.r-project.org/doc/contrib/Paradis-rdebuts_en.pdf> (also in French: _fr.pdf)
- Cheat sheets: <https://www.rstudio.com/resources/cheatsheets>

##### Some exercises

```{r R_exercises}
a = c(20,5,6,2,9,12)
b = 1
c = a + b
print(c)
```


##### Simple regression

```{r R_regression}
# Generating data
nobs <- 100
x.seq <- runif(n=nobs,min=0,max=100)
a.target <- 2
b.target <- 2
sigma2.target <- 300
y.seq <- a.target + b.target*x.seq +
         rnorm(n=nobs,mean=0,sd=sqrt(sigma2.target))

# Data-set
Data <- as.data.frame(cbind(y.seq,x.seq))

# Plot
plot(x.seq,y.seq)

# Estimation
M <- lm(y.seq~x.seq,data=Data)
summary(M)
M$coefficients
var(M$residuals)
M$df

# Graphics 
x.pred <- seq(from=0,to=100,length.out=100) # We will predict y for 100 values of x
X.pred <- as.data.frame(x.pred)
names(X.pred) <- "x.seq"
Y.pred <- predict.lm(M,X.pred,interval="confidence",
                     type="response")

plot(x.seq,y.seq,col=grey(0.7),
     xlab="X",
     ylab="Y",
     main="Simple linear regression") # Row data

# Confidence envelops for predictive posterior
lines(x.pred,Y.pred[,1],col="red",lwd=3)
lines(x.pred,Y.pred[,2],col="blue",lwd=2,lty=2)
lines(x.pred,Y.pred[,3],col="blue",lwd=2,lty=2)

```

#### Python

##### Some links

- <https://www.python.org/about/gettingstarted>
- <https://www.pythonsheets.com>

##### Some exercises

You can execute Python code within a R notebooks.

```{python}
import numpy as np
a = np.array([20,5,6,2,9,12])
b = 1
c = a + b
print(type(c))
print(c)
```

## Preparing the data-set

### Downloading the geospatial data

To model the spatial probability of deforestation, we need a map of the past deforestation and maps of potential explicative environmental variables. Environmental variables can be derived from topography (altitude and slope), accessibility (distance to roads, towns, and forest edge), deforestation history (distance to previous deforestation) or landscape policy (protected area network) for example. In our example, we use the following variables :

```{r variables}
library(kableExtra)
tab1 <- read.table("training/tables/variables.txt",sep=",",header=TRUE)
names(tab1) <- c("Product","Source","Variable derived","Unit","Resolution (m)")
# Kable
knitr::kable(tab1, linesep="") %>%
  kable_styling(bootstrap_options = c("striped"))
```

1. <http://bioscenemada.cirad.fr>, deforestation maps from @Vieilledent2018
2. <http://srtm.csi.cgiar.org>, SRTM 90m Digital Elevation Database v4.1
3. <http://www.geofabrik.de>, data extracts from the OpenStreetMap project for Madagascar,
4. <http://rebioma.net>, SAPM ("Système des Aires Protégées à Madagascar"), 20/12/2010 version


```{r download_data}
# Make data directory
if (!dir.exists("data")) {
  dir.create("data")
  dir.create("data/model")
  dir.create("data/mada")
  dir.create("data/validation")
}

# Files
files <- c("fordefor2010.tif", "dist_edge.tif", "dist_defor.tif",
           "altitude.tif", "slope.tif", "aspect.tif",
           "dist_road.tif", "dist_town.tif", "dist_river.tif",
           "sapm.tif", "roads.kml", "towns.kml", "rivers.kml", "sapm.kml")

# Download model data
for (i in files) {
  if (!file.exists(glue("data/model/{i}"))) {
    f <- glue("https://zenodo.org/record/259582/files/{i}")
    curl_download(f, glue("data/model/{i}"), quiet=FALSE)
  }
}

# Download validation data
if (!file.exists("data/validation/for2017.tif")) {
  f <- "http://bioscenemada.cirad.fr/FileTransfer/for2017.tif"
  curl_download(f, "data/validation/for2017.tif", quiet=FALSE)
}

# Download mada outline
if (!file.exists("data/mada/mada38s.shp")) {
  f <- "http://bioscenemada.cirad.fr/FileTransfer/mada38s.zip"
  curl_download(f, "data/mada/mada38s.zip", quiet=FALSE)
  unzip("data/mada/mada38s.zip", exdir="data/mada")
}

```

In our example, `fordefor2010.tif` is a forest raster at 30m for the year 2010 considering the deforestation on the period 2000-2010 in Madagascar. We can plot this raster and zoom on a region with function `.plot.forest()` in the `forestatrisk` package to see the deforestation data. The remaining forest appears in green and the deforested areas appear in red.

```{r plot_fcc}
# Plot forest cover change 2000-2010
fig <- far$plot$fcc(input_fcc_raster="data/model/fordefor2010.tif",
             output_file="output/fcc.png",
             col=c(255,0,0,255),  # rgba color for deforestation
             figsize=c(5,5),
             dpi=150,
             zoom=c(340000,412000,7420000,7500000))
knitr::include_graphics("output/fcc.png")
```

### Sampling points

We use the function `.sample()` from the `forestatrisk` module to sample 10,000 points (pixel centers) in the deforested areas and in the remaining forest (20,000 points in total). The `input_forest_raster` argument defines the path to the forest raster including the deforested pixels (with value=0) and the remaining forest pixels (value=1) after a given period of time. The random seed can be set with argument `Seed` to reproduce the data of the random sampling.

The `.sample()` function also extracts information from environmental variables for each sampled point. Sampling is done by block to allow the computation on large study areas (e.g. country or continental scale) with a fine spatial resolution (e.g. 30m). The `var_dir` argument indicates the directory including all the environmental raster files (they must be GeoTIFF files with extension `.tif`) that we want to test.

The `.sample()` function identifies the spatial cell for each sample point (sample point are grouped within a spatial cell). Spatial cells and grouped observations are used to estimate the spatial autocorrelation of the deforestation process. The `csize` argument define the width (in km) of the square spatial cells. Each spatial cell will be attributed a parameter. To avoid estimating too many parameters, width of the square spatial cells must be sufficiently large. Both points sampling and extraction of environmental data are done by block to avoid memory problems for big datasets.

```{r sampling}
# Make output directory
if (!dir.exists("output")) {
  dir.create("output")
}

# Training data-set
if (!file.exists("output/sample.txt")) {
  samp <- far$sample(nsamp=10000L, Seed=1234L, csize=10L,
                     var_dir="data/model",
                     input_forest_raster="fordefor2010.tif",
                     output_file="output/sample.txt",
                     blk_rows=1L)
}
samp <- read.table("output/sample.txt", header=TRUE, sep=",")
set.seed(1234)
train <- sample(1:20000, size=10000, replace=FALSE)
data_train <- samp[train,] %>% dplyr::filter(complete.cases(.))
data_valid <- samp[-train,] %>% dplyr::filter(complete.cases(.))
head(data_train)

```

Sampled observations can be plotted using function `.plot.obs()` from the deforestprob module. Dark red dots indicate deforestation observations and dark green dots indicate forest observations.

```{r plot_sample}
# Plot sample points
fig <- far$plot$obs(sample=data_train,
             name_forest_var="fordefor2010",
             input_fcc_raster="data/model/fordefor2010.tif",
             output_file="output/obs.png",
             zoom=c(340000,412000,7420000,7500000),
             figsize=c(5,5),#c(11.69,8.27),
             s=5,dpi=300)
knitr::include_graphics("output/obs.png")
```


### Descriptive statistics

Before modelling the deforestation, it is important to look at the relationship between environmental variables and deforestation. Using formulas from the [`patsy`](https://github.com/pydata/patsy) Python module, we can specify the relationships that we want to look at. In the example below, we plot the relationships between some continuous environmental variables and the probability of deforestation using function `.plot.correlation()` from the `forestatrisk` package. Note that -1 must be set at the end of the formula. The function `.correlation()` returns a serie of graphics that can be analyzed to choose the right relationship for each continuous variable (linear or polynomial for example).

```{r correlations}
# Descriptive statistics

# Model formulas
formula_1 <- paste0("fordefor2010 ~ dist_road + dist_town + dist_defor +",
                    "dist_river + dist_edge + altitude + slope + aspect - 1")
# Standardized variables (mean=0, std=1)
formula_2 <- paste0("fordefor2010 ~ scale(dist_road) + scale(dist_town) +",
                    "scale(dist_defor) + scale(dist_river) + scale(dist_edge) +",
                    "scale(altitude) + scale(slope) + scale(aspect) - 1")
formulas <- c(formula_1, formula_2)

# List to store figures
corr_fig <- list()

# Loop on formulas
for (f in 1:length(formulas)) {
    # Output file
    of <- glue("output/correlation_{f}.pdf")
    # Data
    dmat <- patsy$dmatrices(formulas[f], data=data_train, eval_env=-1L,
                                  return_type="dataframe")
    # Plots
    fig <- far$plot$correlation(y=dmat[[1]],data=dmat[[2]],
                         plots_per_page=3L,figsize=c(7,8),
                         dpi=100L,output_file=of)
}
```

In this example (see pdf files produced), we can see that a linear model should be sufficient to represent the relationship between the probability of deforestation and the standardized distance to the nearest road or town. On the contrary, it could be interesting to fit a polynomial model for the the standardized distance to previous deforestation (`dist_defor` variable) for which the relationship seems non-linear. Several models can be fitted and compared to see if a second-order or third-order polynomial relationship is justified.

## Deforestation model

We propose to use the **Binomial iCAR** model [@Vieilledent2014] to estimate the deforestation probability of a pixel given a set of environmental variables. The Binomial iCAR model is a linear Binomial logistic regression model including an intrinsic Conditional Autoregressive (iCAR) process to account for the spatial autocorrelation of the observations. Parameter inference is done in a hierarchical Bayesian framework. The `.model_binomial_iCAR()` function from the `forestatrisk` module includes a Metropolis-within-Gibbs algorithm written in pure C code to reduce computation time.

```{r Bayes, echo=FALSE}
knitr::include_graphics("training/figures/Bayes.jpg")
```

Figure caption: **Parameter inference is done in a hierarchical Bayesian framework.** Bayesian statistics rely on the [Bayes' theorem](https://en.wikipedia.org/wiki/Bayes'_theorem) named after [Reverend Thomas Bayes](https://en.wikipedia.org/wiki/Thomas_Bayes). Each parameter has a prior and an approximated posterior probability distribution from which we can compute the mean, standard deviation, credible intervals at 95%, etc.

For the deforestation process it is _**very important to take into account the spatial autocorrelation of the process**_ with spatial random effects. Indeed, the selected fixed environmental variables are not able to fully explain the spatial variability of the deforestation process, especially when working at large geographical scales, such as the national or continental scale. Spatial random effects allow estimating a higher/lower probability of deforestation in a particular region (associated to **unmeasurable** or **unknow** factors) that is different from the mean probability of deforestation derived from the environmental factors included in the model. The Binomial iCAR model can be described as follow:

**Ecological process**

\begin{equation}
y_i \sim \mathcal{B}inomial(\theta_i,t_i) \\
\text{logit}(\theta_i) = X_i \beta + \rho_{j(i)}
\end{equation}

$y_i$: random variable for the deforestation process (0 if no deforestation, 1 if deforestation)

$\theta_i$: probability of deforestation

$t_i$: number of trials (always 1 in our example)

$X_i$: vector of values for environmental explicative variables

$\beta$: vector of fixed effect parameters

$\rho_j$: spatial random effect
  
$j(i)$: index of the spatial entity for observation $i$.
  
**Spatial autocorrelation**
  
The spatial autocorrelation is modelled with an intrinsic conditional autoregressive (iCAR) process:

\begin{equation}
\rho_j \sim \mathcal{N}ormal(\mu_j,V_{\rho} / n_j)
\end{equation}

$\mu_j$: mean of $\rho_{j'}$ in the neighborhood of $j$.
  
$V_{\rho}$: variance of the spatial random effects.
  
$n_j$: number of neighbors for spatial entity $j$.

```{r iCAR_process, echo=FALSE}
knitr::include_graphics("training/figures/iCAR.png")
```

Figure caption: **Representation of the neighborhood for the intrinsic conditional autoregressive (iCAR) process.** Target spatial cell $j$ has 8 neighbors in this case. Several observations (black points, equivalent to pixel centers in our case) can be located in each spatial cell. Deforestation probability in one spatial cell $j$ depends on deforestation probability in neighboring cells.

### Spatial cells

Before running the model, we add a column indicating the number of trials for each observation (1 in our case as we are considering a Bernoulli process). We then remove any observation with non-available data (NA) from the data-set. We also compute the number of neighbors (`nneigh`) and the neighbor identifiers (`adj`) for each spatial cell using function `.cellneigh` from the `forestatrisk` module.

```{r spatial_cells}
# Spatial cells for spatial-autocorrelation
neighborhood <- far$cellneigh_ctry(raster="data/model/fordefor2010.tif",
                                   vector="data/mada/mada38s.shp",
                                   csize=10L, rank=1L)
nneigh <- neighborhood[[1]]
adj <- neighborhood[[2]]
cell_in <- neighborhood[[3]]
ncell <- neighborhood[[4]]

# Udpate cell number in dataset
cell_rank <- vector()
for (i in 1:nrow(data_train)) {
  cell_rank[i] <- which(cell_in==data_train$cell[i])-1 # ! cells start at zero
}
data_train$cell <- cell_rank

```

### Model formula

A model formula must also be defined to specify the explicative variables we want to include in the model. The formula allows specifying some variable transformations (such as standardization in our case). See the [`patsy`](https://patsy.readthedocs.io/en/latest/) module for more information. In our model, we included the following variables: location inside a protected area, altitude, distance to past deforestation (with a degree two polynomial), distance to forest edge, distance to nearest road and distance to nearest town. The formula must end with the name of the variable indicating the spatial cell for each observation (`cell` in our case).

```{r formula}
# Formula
data_train$trials <- 1  # Set number of trials to one
formula <- paste0("I(1-fordefor2010) + trials ~ C(sapm) + scale(altitude) + scale(slope) +",
                  "scale(dist_defor) + np.power(scale(dist_defor),2) + ",
                  "scale(dist_edge) + ",
                  "scale(dist_road) + scale(dist_town) + cell")


```

### Fitting model parameters

```{r model}
# Model
mod_binomial_iCAR <- far$model_binomial_iCAR(
  # Observations
  suitability_formula=formula, data=data_train,
  # Spatial structure
  n_neighbors=np_array(nneigh,dtype="int32"), neighbors=np_array(adj,dtype="int32"),
  # Environment
  eval_env=-1L,
  # Chains
  burnin=1000L, mcmc=1000L, thin=1L,
  # Starting values
  beta_start=-99)

```

### Model summary

Once the model has been fitted, we can print a summary of the model showing the parameter estimates. The 95% credible intervals obtained from the posterior distribution of each parameter, except distance to nearest town (`dist_town`), do not include zero, indicating that parameters are significantly different from zero. The variance of the spatial random effects (`Vrho`) is given together with the deviance value, which can be used to compare different statistical models (lower deviance is better). Looking at the parameter estimates, we can see that the deforestation probability is much lower inside protected areas and that deforestation probability decreases with altitude, slope, distance to past deforestation, forest edge, roads and towns. Parameter values are then coherent regarding the deforestation process and easy to interpret.

```{r}
sink(file="output/summary_mod_binomial_iCAR.txt")
print(mod_binomial_iCAR)
sink()
print(mod_binomial_iCAR)
```

### Plot traces and posteriors

To check for the convergence of the Markov chain Monte Carlo (MCMC), we can plot the traces and the posterior distributions of the estimated parameters using method `.plot()` associated to the `hSDM_binomial_iCAR` class defined in the `deforestprob` module. This method returns the figures showing the traces and posterior distributions.

```{r plot_with_py, echo=FALSE}
traces_fig <- mod_binomial_iCAR$plot(output_file="output/mcmc.pdf",
                                     plots_per_page=3L,
                                     figsize=c(9,6),
                                     dpi=100)
```

```{r plot_with_r}
require(coda)
mcmc <- as.mcmc(mod_binomial_iCAR$mcmc)
plot(mcmc[,c(1:3,10,11)])
```

## Forecasting deforestation

### Resampling the spatial random effects

We use the model to predict the spatial probability of deforestation at the national scale for Madagascar. Before, doing so, we smooth the spatial random effects which have been estimated at a coarse resolution (10km in our example). To do so, we use the function `.resample_rho()` from the `deforestprob` module to resample the results at a finer resolution using a bilinear interpolation. The function writes a raster file to the disk with a resolution of the raster specified in the argument `input_raster` of the function (1km in our case). The function `.resample_rho()` returns a figure of the spatial random effects that can be plotted.

```{r rho, fig.height=7, message=FALSE, warning=FALSE}
# Get the spatial random effects
rho <- rep(-9999,ncell)  # -9999 will be considered as nodata
rho[cell_in+1] <- mod_binomial_iCAR$rho

# Resample them
fig <- far$resample_rho(rho=r_to_py(rho), input_raster="data/model/fordefor2010.tif",
                 output_file="output/rho.tif",
                 csize_orig=10L, csize_new=1L)

# Plot random effects
fig <- far$plot$rho("output/rho_orig.tif",output_file="output/rho_orig.png")
fig <- far$plot$rho("output/rho.tif",output_file="output/rho.png")

# Plot with R
mada <- rgdal::readOGR(dsn="data/mada",layer="mada38s", verbose=FALSE)
r.rho_orig <- raster("output/rho_orig.tif")
r.rho <- raster("output/rho.tif")
rho_plot(r.rho_orig, mada, output_file="output/rho_orig_ggplot.png",
         quantiles_legend=c(0.025,0.975),width=4.5, height=8)
rho_plot(r.rho, mada, output_file="output/rho_ggplot.png",
         quantiles_legend=c(0.025,0.975),width=4.5, height=8)

```

### Computing spatial probability of deforestation

The `.predict_raster_binomial_iCAR()` function of the `forestatrisk` module can be used to predict the spatial probability of deforestation from an **hSDM_binomial_iCAR** model (i.e. an object of class `hSDM_binomial_iCAR`). The function writes a raster of predictions to the disk. The prediction is done by block to avoid memory problems for big datasets. Functions will return NA for pixels with no forest or for pixels with missing environmental variables.

```{r compute_predictions}
far$predict_raster_binomial_iCAR(mod_binomial_iCAR, var_dir="data/model",
                                 input_cell_raster="output/rho.tif",
                                 input_forest_raster="data/model/fordefor2010.tif",
                                 output_file="output/pred_binomial_iCAR.tif",
                                 blk_rows=128L)

```

The raster of predictions can then be plotted.

```{r plot_predictions}
fig <- far$plot$prob("output/pred_binomial_iCAR.tif",
                     output_file="output/pred_binomial_iCAR.png",
                     figsize=c(4,4))
knitr::include_graphics("output/pred_binomial_iCAR.png")
```

### Predicting future forest cover

Given the spatial probability of deforestation and the number of hectares that should be deforested, we can predict the future forest cover using function `.deforest()` from the `forestatrisk` package. The number of hectares are converted into number of pixels to be deforested. Pixels with the highest probability of deforestation are deforested first. The function computes a probability threshold above which pixels are deforested.

In our example, we consider an annual deforestation of roughly 100,000 ha for Madagascar. Considering the period 2010-2050, this would correspond to 4 Mha of deforestation. This number can of course be more precise and refined considering various deforestation scenarios (demographic growth, economic development, etc.).

```{r compute_future_forest_cover}
deforest <- far$deforest(input_raster="output/pred_binomial_iCAR.tif",
                         hectares=4000000,
                         output_file="output/forest_cover_2050.tif",
                         blk_rows=128L)
```

We can plot the predicted future forest cover in 2050 with [leaflet](http://rstudio.github.io/leaflet/). We first reproject the raster to WGS 84 / World Mercator projection ([EPSG:3857](http://spatialreference.org/ref/sr-org/7483/)) and we resample the map at 250 m using function `gdalwarp` from `GDAL` to obtain a lower resolution raster that can be transformed into a lower size image that can be plotted with leaflet.

```{bash}
gdalwarp -overwrite -s_srs EPSG:32738 -t_srs EPSG:3857 -tr 250 250 \
         -ot Byte -r near -co "COMPRESS=LZW" -co "PREDICTOR=2" -co "BIGTIFF=YES" \
         output/forest_cover_2050.tif output/forest_cover_2050_epsg3857_ov32.tif
```

We also set the color of the map and the extent of the view for leaflet.

```{r}
# Colors and extent view for leaflet
r <- raster("output/forest_cover_2050_epsg3857_ov32.tif")
pal <- colorFactor(c("red", "darkgreen"), c(0,1), na.color = "transparent")
r1 <- raster(nrows=1,ncols=1,ext=extent(340000,412000,7420000,7500000),crs=CRS("+init=epsg:32738"))
r2 <- projectExtent(r1, crs=CRS("+init=epsg:4326"))
ext2 <- extent(r2)
```

The red areas represent the deforestation on the period 2010-2050. The green areas represent the remaining forest in 2050. Most of the remaining forest in 2050 are inside the protected areas or located in remote areas, at high altitudes and far from roads and big cities (for example in the Tsaratanana mountain region and around the Masoala peninsula, north-east Madagascar).

```{r fcc_leaflet}
# Leaflet map
m <- leaflet() %>% addTiles() %>%
  addRasterImage(r, colors=pal, opacity=0.8, project=FALSE) %>%
  fitBounds(ext2@xmin,ext2@ymin,ext2@xmax,ext2@ymax)
m
```

## References